Wide-Coverage Grammar Extraction from Thai Treebank
نویسندگان
چکیده
Parsing is an important step for natural language understanding, including phrase alignment for supporting statistical machine translation. Ability on analysing real text by parser strongly depends on grammar. Treebank could be one of the sources for grammar extraction. However, treebank construction largely relies on human annotators intuitions. Different intuitions from multiple annotators bring inconsistency to treebank construction. In this paper, we propose method to construct treebank with semi-automatic correction. Furthermore, we utilise extracted grammar from our corrected treebank to improve semi-automatic phrase structure annotation accuracy for next incremental treebank construction. Thus, it reduces wasting labour force on phrase structure annotation. Using corrected treebanks, we can extract wider coverage grammar from them for supporting parser.
منابع مشابه
Building Deep Dependency Structures with a Wide-Coverage CCG Parser
This paper describes a wide-coverage statistical parser that uses Combinatory Categorial Grammar (CCG) to derive dependency structures. The parser differs from most existing wide-coverage treebank parsers in capturing the long-range dependencies inherent in constructions such as coordination, extraction, raising and control, as well as the standard local predicate-argument dependencies. A set o...
متن کاملBuilding Deep Dependency Structures using a Wide-Coverage CCG Parser
This paper describes a wide-coverage statistical parser that uses Combinatory Categorial Grammar (CCG) to derive dependency structures. The parser differs from most existing wide-coverage treebank parsers in capturing the long-range dependencies inherent in constructions such as coordination, extraction, raising and control, as well as the standard local predicate-argument dependencies. A set o...
متن کاملSemi-automated Extraction of a Wide-Coverage Type-Logical Grammar for French
The paper describes the development of a wide-coverage type-logical grammar for French, which has been extracted from the Paris 7 treebank and received a significant amount of manual verification and cleanup. The resulting treebank is evaluated using a supertagger and performs at a level comparable to the best supertagging results for English. Résumé. Cet article décrit le développement d’une g...
متن کاملAutomatic Transformation of the Thai Categorial Grammar Treebank to Dependency Trees
A method for deriving an approximately labeled dependency treebank from the Thai Categorial Grammar Treebank has been implemented. The method involves a lexical dictionary for assigning dependency directions to the CG types associated with the grammatical entities in the CG bank, falling back on a generic mapping of CG types in case of unknown words. Currently, all but a handful of the trees in...
متن کاملAutomated Extraction of Tree Adjoining Grammars from a Treebank for Vietnamese
In this paper, we present a system that automatically extracts lexicalized tree adjoining grammars (LTAG) from treebanks. We first discuss in detail extraction algorithms and compare them to previous works. We then report the first LTAG extraction result for Vietnamese, using a recently released Vietnamese treebank. The implementation of an open source and language independent system for automa...
متن کامل